Rhythmic unit extraction and modelling for automatic language identification
نویسندگان
چکیده
This paper deals with an approach to automatic language identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for seven languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% of correct language identification on average for the seven languages with utterances of 21 s. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the seven languages identification task). 2005 Published by Elsevier B.V.
منابع مشابه
Automatic Modelling of Rhythm and Intonation for Language Identification
This paper deals with an approach to Automatic Language Identification using only prosodic modeling. The traditional approach for language identification focuses mainly on phonotactics because it gives the best results. Recent studies reveal that humans use different levels of perception to identify a language, in particular prosodic cues. Among prosodic features, rhythm is known to carry a sub...
متن کاملTitle: Rhythmic Unit Extraction and Modelling for Automatic Language Identification
Authors : Jean-Luc Rouas, Jérôme Farinas, François Pellegrino, Régine André-Obrecht 1 Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS – Institut National Polytechique de Toulouse – Université Paul Sabatier – Université Toulouse 1, France 2 Laboratoire Dynamique Du Langage UMR 5596 CNRS – Université Lumière Lyon 2, France {[email protected], [email protected], Francois.Pellegrin...
متن کاملCan Automatically Extracted Rhythmic Units Discriminate among Languages?
This paper deals with rhythmic modeling and its application to language identification. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, but significant problems are unresolved for its modeling. In this paper, an algorithm dedicated to rhythmic segmentation is described. Experiments are performed on read speec...
متن کاملKohonen Self Organizing for Automatic Identification of Cartographic Objects
Automatic identification and localization of cartographic objects in aerial and satellite images have gained increasing attention in recent years in digital photogrammetry and remote sensing. Although the automatic extraction of man made objects in essence is still an unresolved issue, the man made objects can be extracted from aerial photos and satellite images. Recently, the high-resolution s...
متن کاملAutomatic rhythm modeling for language identification
This paper deals with an approach to Automatic Language Identification based on rhythmic modeling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, but significant problems are unresolved for its modeling. In this paper, an algorithm of rhythm extraction is described. Experiments are performed on read speech f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Speech Communication
دوره 47 شماره
صفحات -
تاریخ انتشار 2005